Preprocessing of data

Tricks for RNA-seq and other large data sets

Douwe Molenaar

Systems Biology Lab

2025-01-01

What we will discuss

  • Statistical hypothesis testing
  • Count data (like RNA-seq)
  • Variance stabilization
  • Negative binomial model for count data
  • Error control in analyses of large data sets

Assumed knowledge

  • Sample space \(\Omega\)
  • Event space, event
  • Probability of events
  • Probability distribution, probability distribution functions (pmf \(p(x)\), pdf \(f(x)\))
  • Random variables (R.V.)
  • Expected value of an R.V.: \(\mathbb{E}[X] = \sum_{x \in \Omega_X}{x \, p(x)} \quad \left (\text{or} \quad \mathbb{E}[X] = \int_{\Omega_X}{x f(x) \,\mathrm{d}x} \right)\)
  • Variance of an R.V.: \(\text{var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]\)
  • I.I.D. samples \(x_i\) for an R.V.
  • Sample mean: \(\overline{x} = \frac{1}{n} \sum_{i=1}^n x_i\)
  • Sample variance (\(s^2\)) and standard deviation (\(s\)): \(s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2 \qquad s = \sqrt{s^2}\)

Lecture 1

Statistical hypothesis testing

Making the best of our ignorance of causes

Classifying variation in terms of causes

Biological causes

  • Variation of interest
    • Response to conditions, treatments
  • Variation between similar objects (mice, humans, bacterial cultures)
    • Noteworthy, but usually not the target of the experiment

Technical causes (“errors”)

  • Variation in sample preparation
  • Differences in sample quality
  • Differences in amount of sample
  • Instrumental or methodical variation

Classifying variation in terms of accuracy and precision


Handling types of technical variation

Bias / systematic error

  1. Improve experimental set-up, instrument settings, sample selection, etc. for better accuracy
  2. Correct the biases (often called normalization) or
  3. Account for the biases in the statistical model

Random error / noise

  1. Improve experimental set-up, instrument settings, sample selection, etc. for better precision
  2. Investigate the probability distribution of the remaining error
  3. Use methods that can handle these probability distributions, like statistical tests

Goal of statistical hypothesis testing

Does the distribution of a random variable \(X\) depend on another variable (condition, disease, etc.)?

  • The precision or dispersion will play a role in the answer
  • The answer will be given in terms of probabilities
  • Performing t-tests on the examples to the right1:
    • Low precision: p-value=0.152
    • High precision: p-value=0.00014

But what is the meaning of these p-values?

A case of statistical hypothesis testing

We measure concentrations \(a\) and \(b\) of a metabolite in person under conditions A and B. As the “boring/nothing special” hypothesis we propose the following Null Hypothesis or \(H_0\):

There is no difference between concentrations \(a\) and \(b\)

More precisely

\(H_0\): \(a\) and \(b\) are drawn from distributions with the same mean

To be useful in statistical testing we need the following information:

  • The type of distribution from which \(a\) and \(b\) were drawn (e.g. a normal distribution)
  • Other parameters of the distribution are known or can be estimated (e.g. \(\sigma\))

Definition of the p-value

p-value: The probability of measuring the observed value or a more extreme value, when \(H_0\) is true

  • If \(H_0\) is true then the density of the t-statistic (\(t\)) is given by the dark-blue curve
  • Analysis of an experiment yields a \(t\) indicated by the yellow dot (say, the value is \(-d\))
  • The p-value of that observation equals the sum of the red areas:

\[\text{p-value}=P(t \in \left(-\infty,-d\right] \cup \left[d,\infty \right))\]

Questions about the previous slide

Make sure that you have the knowledge of statistical analyses to answer the following questions:

  1. What is \(H_0\) in the previous slide? What is the alternative hypothesis \(H_1\)?
  2. Describe a type of experiment in transcriptomics on which you can perform a t-test.
  3. How many measurements do you need at least to perform that t-test?
  4. (True/false) The alternative hypothesis is true when the p-value is below 0.05.

Statistical tests vs. the truth

  • The rejected cases are the interesting ones that the statistical test yields, the positive calls
  • We also call these False positives and negatives Type I and Type II errors
  • The p-value only tells us something about the cases in the first row where \(H_0\) is True

Recapitulation: What is the meaning of p-value and \(\alpha\)?

The p-value of a statistical test is the probability that the observed or more extreme events happen if \(H_0\) is true.

The \(\alpha\) level, or a “p-value cut-off” is the accepted level of a false positive conclusions or Type I errors, i.e. the probability of rejecting \(H_0\) when \(H_0\) is true.

  • We call this procedure, using an \(\alpha\) level for rejection of \(H_0\) Type I error control
    • We know that for the true \(H_0\) cases a fraction \(\alpha\) will be a false positive

Questions to test your understanding of the p-value

  1. The p-value of a test was 0.03. This is the probability that you observe this or a more extreme test result by chance. Correct?
  2. The p-value of a test was 0.04. What does this tell you about the probability that we have true positive?
  3. The p-value of a test was 0.01. This tells us that the probability of a Type II error is 0.01. Correct?

Answer to question 1:

Incorrect. This is a wrong interpretation of the p-value: something essential is missing. The assertion only holds for cases when the null hypothesis is true. But we do not know whether the null hypothesis is true, neither before nor after performing a test! Beware of the internet: you will find wrong statements everywhere (see below). This harms the correct interpretation of statistical tests.


\(\longleftarrow\) No, twice NO!

Count data

Exemplified by RNA-seq data

RNA sequencing and mapping sequences: RNA-seq

  • Isolate messenger RNA (mRNA)
  • Make DNA copies of the RNA
  • Add adaptors for sequencing
  • Read sequences of fragments

 

 

  • Map sequences to genes (ORF’s)

mRNA molecules: those of the red gene are indicated

Notice variation in the two random samples:

  1. the number of RNA molecules per sample varies
  2. the fraction of red molecules per sample varies

Example of a table of RNA-seq counts

Five rows from the Pasilla data set1

gene_id

untreated1

untreated2

untreated3

untreated4

treated1

treated2

treated3

...

...

...

...

...

...

...

...

FBgn0020369

3387

4295

1315

1853

4884

2133

2165

FBgn0020370

3186

4305

1824

2094

3525

1973

2120

FBgn0020371

1

0

1

1

1

0

0

FBgn0020372

38

84

29

28

63

28

27

...

...

...

...

...

...

...

...

TOTAL COUNT

13972512

21911438

8358426

9841335

18670279

9571826

10343856

Notice:

  • Replicate samples of treated and untreated conditions
  • The variation in total count between replicates
  • The variation in counts per gene

Bias in RNA-seq counts: obvious effects

Variation in sequence fragments counted due to technical variation:

  • Variation in amount of isolated mRNA
  • Variation in quality of isolated mRNA
  • Variation in cDNA synthesis efficiency
  • Variation in sequencing efficiency

These all lead to variation in sequencing depth or total count between samples.

Normalization of total counts

  • Simple method

    • Assuming that variation is due to technical reasons and not because concentrations of all mRNA actually differ:
    • Solution: divide sequence count for each gene by the total count in the sample
    • Yields: RPM, Rate Per Million reads
  • Advanced method
    • Assume that only a limited number of mRNA differ between samples due to technical or biological reasons
    • Solution: perform robust linear regression1
    • Yields: Size factor for each sample

Expected properties of random count variable \(X\) for red

  1. It has a discrete distribution, because we count reads per target gene
  2. The probability \(p\) of sequencing a fragment of gene red by randomly picking one from all fragments equals its proportion among all fragments
  3. The probability \(p\) remains constant after the first, second, etc. fragment of red has been sequenced, because the number of sequenced fragments \(\ll\) available DNA fragments

Possible distribution for RNA-seq counts

  • We “draw” a very large sample \(n\) (millions) from RNA fragments.
  • The fraction of fragments \(p\) from gene red remains constant (does not decrease by earlier draws).
  • All subsequent experimental procedures do not change the fraction of sequenced cDNA fragments originating from the red gene.

Proposal: then \(X\) has a Poisson distribution with parameter \(\lambda\):

\[P \left( X = k \right) = \frac{\lambda^k e^{-\lambda}}{k!}\] where \(\lambda = p \times n\) is the expected number of red counts in a sample of \(n\) RNA fragments.

Characteristics of the Poisson distribution

Shape for different values of \(\lambda\)

Mean equals variance

\[\lambda = \mu = \sigma^2 \qquad \rightarrow \sigma=\sqrt{\mu}\]

However: RNA-seq counts are not Poisson-distributed




  • The 4 Pasilla untreated RNA-seq replicates were used
  • The axes have logarithmic scales
  • The yellow line: \(s \propto \overline{x}^\color{red}{1}\), the red line: \(s \propto \overline{x}^\color{red}{\frac{1}{2}}\)
  • For a Poisson-distributed random variable \(\sigma^2=\mu\), hence \(s \propto \overline{x}^\frac{1}{2}\)
  • The observed slope is larger than \(\frac{1}{2}\), i.e. \(s\) increases more rapidly than expected (over-dispersion)

Questions about count data

  • Using the property that for Poisson-distributed counts we expect \(s \propto \overline{x}^\frac{1}{2}\), argue why precision increases with larger sample sizes (more counts), i.e. your estimate of \(\lambda\) (=\(\mu\)) will be more precise.
  • Even though \(k > \frac{1}{2}\) in \(s \propto \overline{x}^k\), why is it still useful to have larger sequence depth1 of RNA-seq samples?
    • Or, for which value of \(k\) would it not be very useful to collect samples with larger sequence depth?
  • Is \(s \propto \overline{x}^k\) an accurate model for the dispersion of RNA-seq data? Please motivate your answer.

Lecture 2

Variance stabilization

What is variance?

Variance of a random variable

  • The variance of an R.V. \(X\) equals \(\text{var}(X) = \mathbb{E}\left[\left(X - \mathbb{E}[X]\right)^2\right]\)
  • The variance is a measure of the width of the distribution

Sample variance

  • Take I.I.D. samples for an R.V. \(X\), i.e. \(x_i\)
  • The sample variance is an unbiased estimator of \(\text{var}(X)\)1: \[s^2 = \frac{1}{n-1} \sum_{x=1}^n \left(x_i-\overline{x}\right)^2\]
  • The standard deviation is the square root of the variance \(s = \sqrt{s^2}\)

Variable variance

Definitions

  • homoscedasticity: variance of samples for \(X\) are constant
  • heteroscedasticity: variance of samples for \(X\) varies with their mean value

The problem of heteroscedasticity

  • Many statistical techniques assume homoscedasticity
    • E.g. all techniques that use least squares fitting (ANOVA)
  • Many machine learning algorithms will be affected by heteroscedasticity
    • E.g. clustering, dimension reduction, supervised learning
  • Graphs are often more difficult to interpreted with heteroscedastic data

Investigate the variance before doing statistical tests!

Two options:

  1. Use a distribution and test that are tailored for the data
    • We will demonstrate this for RNA-seq data
  2. Transform the data so that the result is homoscedastic: a variance stabilizing transformation
    • Examples follow below

Many experimental variables have proportional error

  • Proportional error is the most common instrumental error: \(\sigma = a \cdot \mu\)
  • The standard deviation is a fixed percentage of the measured value
  • The data are heteroscedastic

Many experimental variables have proportional error

  • Proportional error is the most common instrumental error: \(\sigma = a \cdot \mu\)

  • The standard deviation is a fixed percentage of the measured value

  • The data are heteroscedastic

  • On a logarithmic scale the slope equals 1: \[\log{(\sigma)} = \log{(a)} + \color{red}{1} \cdot \log{(\mu)}\]

A logarithmic transformation stabilizes the variance

  • Logarithmic transform is the most common data transformation used: \(y = \log{(x)}\)
  • The transformed data is homoscedastic

Stabilization of variance

  • Transform the data in such a way that the variance becomes independent of the mean
    • Logarithmic transform when error is a fixed percentage of the measured value
    • Square root transform for Poisson-distributed count data
    • In general: if \(\sigma \propto \mu^\beta\) then the transformation \(y = x^{1-\beta}\) will stabilize the variance1
  • If transformed data also have normal-distributed error, you can apply t-tests, ANOVA etc.

Applying this to the Pasilla RNA-seq data

\(s \propto \overline{x}^{0.7} \longrightarrow\) transformation: \(y_i = x_i^{0.3}\)

Effect of variance stabilization on dimension reduction

UMAP1 on original data and on log-transformed data

Modeling RNA-seq with the Negative Binomial distribution

Modeling over-dispersed count data

  • The Poisson distribution with parameter did not fit the RNA-seq counts
    • It has a single parameter: \(\lambda=\mu\)
  • An alternative distribution is the Negative Binomial (NB) distribution
    • It is a discrete distribution
    • It has two parameters: mean \(\mu\) and dispersion parameter \(r\)

Comparing Poisson- and NB distributions

  • Using \(\mu=4\) for all examples
  • NB distribution with \(r=\) 1.5, 5, or 20
  • The larger \(r\) the smaller the standard deviation
  • As \(r \rightarrow \infty\), NB approaches Poisson

Fitting a model to RNA-seq data

  1. \(q^A_i\) and \(q^B_i\) are the concentrations of RNA for gene \(i\) under conditions \(A\) and \(B\).
  2. The log fold-difference of the concentrations \(\log{\left(\frac{q^B_i}{q^A_i}\right)}=\log{(q^B_i)}-\log{(q^A_i)}\) is a constant \(\beta^{AB}_i\).
  3. Normalization: The mean counts \(\mu^A_i\) and \(\mu^B_i\) are proportional, with a size factor \(s^A\) or \(s^B\), to the concentrations \(q^A_i\) and \(q^B_i\): \[\mu^A_i = s^A q^A_i \qquad \mu^B_i = s^B q^B_i\]
    • Size factors \(s^A\) or \(s^B\) are estimated as common factors for all genes.
  4. Over all genes, the dispersion parameter \(r\) for the NB distribution is modeled as a smooth function of \(\mu\): \(r = v(\mu)\).
    • Then per gene and condition dispersion parameters are estimated as \(r^A_i = v(\mu^A_i)\) and \(r^B_i = v(\mu^B_i)\).
  5. The observed counts \(K^A_i\), \(K^B_i\) of gene \(i\) are fitted to a Negative-Binomial distribution: \[K^A_i \sim \mathrm{NB}(\mu^A_i, r^A_i) \qquad K^B_i \sim \mathrm{NB}(\mu^B_i, r^B_i)\].
  6. Using the NB distribution, the p-value under \(H_0\) that \(\mu^A_i=\mu^B_i\) is calculated.

Variance-stabilizing transformations of RNA-seq data

Variance-stabilizing transformations for NB-distributed data exist, see Huber & Holmes

It should be used for applications in which the counts are used, for example to assess the quality of the data:

  • Hierarchical clustering & heat maps
  • Dimension reduction, like PCA

Multiple hypothesis testing

In a data set with multiple genes we test multiple null hypotheses

Comparing gene expression under conditions A and B

Gene \(H_0\)
\(G_1\) \(\mu^A_1 = \mu^B_1\)
\(G_2\) \(\mu^A_2 = \mu^B_2\)
\(G_3\) \(\mu^A_3 = \mu^B_3\)
\(\vdots\) \(\vdots\)
\(G_n\) \(\mu^A_n = \mu^B_n\)

In a data set with multiple genes we test multiple null hypotheses

Comparing gene expression under conditions A and B

Gene \(H_0\) p-value test
\(G_1\) \(\mu^A_1 = \mu^B_1\) 0.47 accept
\(G_2\) \(\mu^A_2 = \mu^B_2\) 0.0001 reject
\(G_3\) \(\mu^A_3 = \mu^B_3\) 0.04 reject
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
\(G_n\) \(\mu^A_n = \mu^B_n\) 0.74 accept
  • The genes with rejected null hypotheses, \(p < 0.05\), are the potentially interesting ones; the positive calls or discoveries

The trouble with Type I error control

Suppose, we measure the expression of 5050 genes

  • Suppose also that 50 genes are truly differentially expressed, and their \(H_0\) is rejected
  • Then the remaining ≈5000 genes are not differentially expressed: their \(H_0\) is true
  • Using an \(\alpha\) level (cut-off p-value) of 0.05 for the test, 5% of these 5000 genes (250) will be false positive

Conclusion

The fraction of false positives in the list of all 300 positive calls equals \(\frac{250}{50+250} = 83\%\)!

Why is that?

Because the fraction of genes for which \(H_0\) is true is very large, namely \(\frac{5000}{5050}=0.99\). Therefore, even though we take \(\alpha=0.05\) we still get a lot of false positives.

What would be the fraction of false positives when we chose a more strict \(\alpha=0.01\)?

What do we want to control?

  • With Type I error control or \(\alpha\)-level we control

\[\alpha=\frac{\text{False positives}}{\text{True negatives}+\text{False positives}}\]

  • We would like to control the fraction of false discoveries, or false discovery rate, i.e. the fraction of Type I errors in all positives (cases in the Reject column):

\[\text{FDR}=\frac{\text{False positives}}{\text{True positives} + \text{False positives}}\]

How can we estimate this fraction?

All genes that we test

Some positives, mostly negatives

Distributions of their p-values

However, the truth is hidden

We obtain a combined distribution of p-values

We can reconstruct part of the truth

It really works: distribution of p-values in the Pasilla data set

  • Using a t-test on transformed rates (counts per million)

Controlling the false discovery rate

By moving the dashed line a we can change the FDR to our desire1

Difference between FDR and Type I error control

  • With \(\alpha\)-level control we have no knowledge of the fraction of false positives
  • With FDR control we get (determine) an estimate of the fraction of false positives in a list of selected genes
    • This is essential information for follow-up experiments
  • The p-value is a property of a single hypothesis test, the FDR is a property of a list of positive calls

Using the FDR in follow-up experiments

Based on a list of significant (positive) genes you might:

  • Make knock-outs of each gene.
    Very time-consuming and expensive; use a very low FDR
  • Do a sequence motif analysis.
    False positives may blur motifs; use a low FDR
  • Do a gene-class enrichment analysis.
    False positives as well as false negatives may decrease statistical power; use an intermediate FDR
  • Cluster genes.
    False positives may influence the clustering result; Test a range of FDR

Other method: controlling the Family-Wise Error Rate

  • The family is a collection of hypotheses for which it is reasonable to take into account a combined measure of error
  • The FWER is the probability of obtaining at least one Type I error in the family
  • A well-known, simple procedure to control FWER is the Bonferroni correction:
    • Based on Boole’s inequality: \(P(\bigcup_i^n A_i ) \leq \sum_i^n P(A_i)\)
    • We have a family of \(n\) hypotheses (e.g. \(n\) genes tested)
    • We have a FWER acceptance level of \(\alpha\) (probability of having at least one false positive)
    • Only reject \(H_{0i}\) if the p-value \(p_i \leq \frac{\alpha}{n}\) (e.g. if \(n=1000\) and \(\alpha=0.05\) then \(p_i \leq 0.00005\))
    • Usually much more stringent than FDR (more false negatives)

Wrap-up

What we have discussed

  • Statistical hypothesis testing:
    • Sources and types of error
    • Hypothesis test and p-value
  • Count data
    • Sources of error in RNA-seq
    • Normalization
    • Structure of error (not Poisson!)
  • Variance stabilization
    • How to analyze heteroscedasticity
    • How to alleviate it using transforms
  • Negative binomial model for count data
    • Normalization and hypothesis testing in a package like DEseq
  • Error control in analyses of large data sets
    • Investigating p-value distribution
    • Controling the False Discovery Rate
    • Other method of FWER control: Bonferroni correction

Literature and background

Literature and remarks

Further reading and background

Holmes and Huber (2019), online book, free access.

  • Chapter 4, Gamma-Poisson or Negative Binomial distribution, Variance stabilization, zero-inflated data (4.4)
  • Chapter 8, High-throughput count data (8.1-8.4, 8.8, 8.10)
  • Chapter 6, Multiple Testing (6.7-6.10)

Please, browse through the book: other chapters may be relevant for this course and for future reference

Remark about normalization

Note that the term normalization has different meanings in statistics. You will likely encounter both!

  • In this slide it is used to indicate the correction of bias
  • It is also used to indicate the operation of scaling random variables

Remarks about the NB distribution

The Negative Binomial distribution \(\mathrm{NB}(X=k)\) with parameters \(r\) and \(p\) can be obtained in different ways. One is the following:

  • \(k\) is the number of times that you get a tail if you repeat a Bernoulli trial (coin flip) until you have observed \(r\) heads. In a single Bernoulli trial the probability to throw head equals \(p\), and to throw tail \(1-p\)
  • The mean, \(\mu\) of tails in such an experiment equals \(\frac{rp}{1-p}\)

Another way to generate the NB distribution is as a infinite mixture of Poisson distributions, whose parameters \(\lambda\) are themselves drawn from a gamma-distribution, see Ch4 in Holmes and Huber (2019). It is the reason why the NB distribution is also known as gamma-Poisson distribution.

Remarks about FDR

  • You may encounter the concept local FDR (often indicated by lower case fdr), e.g. in Holmes and Huber.
    In this slide about the FDR the local fdr is the ratio of blue line segment to total (blue + red) line segment on the line \(a\), whereas the total FDR is the ratio of blue area to total area left of the line \(a\). (The areas are the integral of the line segments). It estimates the likelihood of a positive call being a false positive.

  • You may encounter the concept q-value.
    In a list of hypothesis tests ordered by increasing p-values, the q-value of a test is the FDR (expected fraction of false positives) of the collection of hypothesis tests up to and including that test.

  • You will often encounter the terms “corrected p-value” or “adjusted p-value” in the literature when the false discovery rate is meant.
    Do not use these terms: they are confusing and imprecise because the meaning of an FDR is very different from that of a p-value. Make sure that you understand this comment.

  • See Päll et al. (2023) about the importance of checking the distribution of p-values.

Answers to questions

Answers to Section 1.1.10

  1. Incorrect. We do not know this probability.
  2. The answer is not: 96%. The answer is: We do not know.
  3. Incorrect. We do not know this probability.

These questions can only be answered conditional on the fact that the null hypothesis is true. But in none of these cases do we know the probability that the null hypothesis is true (see definition of p-value).

Advanced: algorithms for controlling the FDR

Controlling FDR using the Benjamini-Hochberg algorithm

The BH algorithm (Benjamini1995)

  1. Fix the false discovery rate \(\alpha\) and let \(p_{(1)} \leq p_{(2)} \leq \ldots \leq p_{(m)}\) be the ordered \(p\)-values
  2. Associate quantiles \(q_j = \frac{j}{m}\) with each \(p_{(j)}\)
  3. Define \[k = max\left\{ j : p_{(j)} < \alpha \cdot q_j\right\}\]
  4. Reject all hypotheses \(H_{0j}\) for which \(p_{(j)} \leq p_{(k)}\)

Using the BH rejection threshold we get: FDR \(\leq \alpha\)

Simulation with different fractions of true H0

  • Call \(m_0\) the number of true H0 cases among \(m\) hypothesis tests
  • We define \(\pi_0 = \frac{m_0}{m}\)

The uniform distribution generates a straight line with an offset that equals \(\pi_0\)

Understanding algorithms that estimate FDR

The key aspect is to have an estimate \(\hat{\pi}_0\) of \(\pi_0\)

  • Benjamini-Hochberg (1995): very conservative, \(\hat{\pi}_0 = 1\)
  • Benjamini 2000 and Storey algorithms (2000; 2004): less conservative

Benjamini-Hochberg method: geometric explanation

\[p_{(k)}=\alpha \cdot q_k\]

Cited literature

Benjamini, Y., and Y. Hochberg. 2000. “On the Adaptive Control of the False Discovery Rate in Multiple Testing with Independent Statistics.” Journal of Educational and Behavioral Statistics 25 (1): 60–83. https://doi.org/10.3102/10769986025001060.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 289–300.
Holmes, Susan, and Wolfgang Huber. 2019. Modern Statistics for Modern Biology. http://web.stanford.edu/class/bios221/book/index.html.
Päll, Taavi, Hannes Luidalepp, Tanel Tenson, and Ülo Maiväli. 2023. “A Field-Wide Assessment of Differential Expression Profiling by High-Throughput Sequencing Reveals Widespread Bias.” Edited by Marcus Munafò. PLOS Biology 21 (3): e3002007. https://doi.org/10.1371/journal.pbio.3002007.
Storey, John D., Jonathan E. Taylor, and David Siegmund. 2004. “Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66 (1): 187–205. https://doi.org/10.1111/j.1467-9868.2004.00439.x.